There is the data: atp_tennis containing the following columns: ['Tournament', 'Date', 'Series', 'Court', 'Surface', 'Round', 'Best of', 'Player_1', 'Player_2', 'Winner', 'Rank_1', 'Rank_2', 'Pts_1', 'Pts_2', 'Odd_1', 'Odd_2', 'score'].  
--- The description for each column this data is:
Tournament: Name of the tennis tournament (Brisbane International, Chennai Open, Qatar Exxon Mobil Open ...etc)
Date: Date the match was played (year-month-day)
Series: Category or level of the tennis tournament (ATP250, ATP500, Masters1000 and Grand Slams offer 250, 500, 1000, and 2000 ranking points to the winner seperately.)
Court: Place the match was held (Indoors or Outdoors)
Surface: Type of court surface (Hard, Grass and Clay)
Round: Stage of the tournament (1st Round, 2nd Round, Quarterfinals, Semifinal and The Final)
Best of: Tourament systems ("best of 3" or "best of 5")
Player_1: Name of the first competitor in each match
Player_2: Name of the second competitor in each match
Winner: Name of the Player_1 or Player_2 who won the match 
Rank_1: World rankings of the Player_1 at the time of the match
Rank_2: World rankings of the Player_2 at the time of the match
Pts_1: Ranking points held by Player_1 before the match (accumulated over the season and higher points typically indicate a more successful player)
Pts_2: Ranking points held by Player_2 before the match
Odd_1: Betting odds for Player_1 (reflecting their perceived chances of winning)
Odd_2: Betting odds for Player_2 
Score: Final score of the match ("score of Player_1" - "score of Player_2")
---
Considering contents from the dataset and requirements from user. Please note DO NOT CHANGE FILE AND VARIABLE NAMES THAT I HAVE SET!
Interactions begin:
[USER (data scientist)]: Can you help me find the matches in the atp_tennis dataset where Roger Federer is either Player_1 or Player_2? I would like to have a list of Match IDs for these matches.

[YOU (AI assistant)]: Sure, I can help you with that. To find the matches where Roger Federer is either Player_1 or Player_2, you can use the following code :
'''
import pandas as pd
import numpy as np
import pickle

atp_tennis = pd.read_csv('atp_tennis.csv')

# Filter the DataFrame
federer_matches = atp_tennis[(atp_tennis['Player_1'] == 'Federer R.') | (atp_tennis['Player_2'] == 'Federer R.')]

# Convert the index to a list
federer_match_ids = federer_matches.index.tolist()

print(federer_match_ids)

'''

[USER (data scientist)]: Great! We have a new competitor named 'Raonic M.'. I need to find the win rates, head-to-head records against other players in the affected matches, and performance on the same surface as the tournament for this new competitor. Please provide a dataframe with Player Names, Win Rates, Head-to-Head Records, and Surface Performance.

[YOU (AI assistant)]: To find the win rates, head-to-head records, and surface performance for 'Raonic M.', you can use the following code :
'''
import pandas as pd
import numpy as np
import pickle

atp_tennis = pd.read_csv('atp_tennis.csv')

# Find Federer matches
federer_matches = atp_tennis[(atp_tennis['Player_1'] == 'Federer R.') | (atp_tennis['Player_2'] == 'Federer R.')]
federer_opponents = federer_matches['Player_1'].where(federer_matches['Player_1'] != 'Federer R.', federer_matches['Player_2']).unique()

# Retrieve unique surface types
surface_types = atp_tennis['Surface'].unique()

# Function to calculate a tennis player's overall win rate, head-to-head record, and performance on different surfaces against a specific opponent.
def calculate_performance(player, opponent, data, surface_types):
    player_matches = data[((data['Player_1'] == player) & (data['Player_2'] == opponent)) | ((data['Player_1'] == opponent) & (data['Player_2'] == player))]
    total_matches = len(player_matches)
    wins = len(player_matches[player_matches['Winner'] == player])
    win_rate = wins / total_matches if total_matches > 0 else 0

    h2h_record = wins / total_matches if total_matches > 0 else 0

    surface_performance = {}
    for surface in surface_types:
        player_surface_matches = player_matches[player_matches['Surface'] == surface]
        player_surface_wins = len(player_surface_matches[player_surface_matches['Winner'] == player])
        surface_performance[surface] = player_surface_wins / player_surface_matches.shape[0] if player_surface_matches.shape[0] > 0 else 0

    return win_rate, h2h_record, surface_performance

# Initialize an empty dataframe for storing the results
columns = ['Player', 'Opponent', 'Win Rate', 'Head-to-Head'] + [f'{surface} Surface Performance' for surface in surface_types]
player_stats = pd.DataFrame(columns=columns)

# Loop through Federer's opponents
for opponent in federer_opponents:
    # Calculate win rate, head-to-head record, and surface performance for Raonic M. against each opponent
    win_rate, h2h_record, surface_performance = calculate_performance('Raonic M.', opponent, atp_tennis, surface_types)

    # Append the results to the dataframe
    row_data = {'Player': 'Raonic M.', 'Opponent': opponent, 'Win Rate': win_rate, 'Head-to-Head': h2h_record}
    row_data.update({f'{surface} Surface Performance': surface_performance[surface] for surface in surface_types})
    player_stats = pd.concat([player_stats, pd.DataFrame(row_data, index=[0])], ignore_index=True)

# Display the resulting dataframe
print(player_stats)

'''

[USER (data scientist)]: Thank you. Now, I need to update the odds for each affected match, considering the new competitor's performance (win rate). Please provide a dataframe with Match IDs, Original Odds, and Updated Odds, where the odds are adjusted based on the new competitor's win rate and a performance weight factor.

[YOU (AI assistant)]: To update the odds for each affected match, considering the new competitor's performance, you can use the following code :
'''
import pandas as pd
import numpy as np
import pickle

atp_tennis = pd.read_csv('atp_tennis.csv')

# Function to calculate head-to-head win rates between players
def calculate_head_to_head_win_rate(data):  
    unique_players = pd.concat([data['Player_1'], data['Player_2']]).unique()  
  
    total_matches = data.groupby(['Player_1', 'Player_2']).size().reset_index().rename(columns={0: 'count_total'})  
    player_wins = data[data['Winner'] == data['Player_1']].groupby(['Player_1', 'Player_2']).size().reset_index().rename(columns={0: 'count_wins'})  
  
    all_pairs = pd.MultiIndex.from_product([unique_players, unique_players], names=['Player', 'Opponent']).to_frame(index=False)  
    all_pairs = all_pairs[all_pairs['Player'] != all_pairs['Opponent']]  
      
    h2h_win_rates = all_pairs.merge(total_matches, left_on=['Player', 'Opponent'], right_on=['Player_1', 'Player_2'], how='left').drop(columns=['Player_1', 'Player_2'])  
    h2h_win_rates = h2h_win_rates.merge(player_wins, left_on=['Player', 'Opponent'], right_on=['Player_1', 'Player_2'], how='left').drop(columns=['Player_1', 'Player_2'])  
  
    h2h_win_rates['count_wins'].fillna(0, inplace=True)  
    h2h_win_rates['count_total'].fillna(1, inplace=True)  
    h2h_win_rates['Head-to-Head Win Rate'] = h2h_win_rates['count_wins'] / h2h_win_rates['count_total']  
  
    h2h_win_rates.drop(columns=['count_total', 'count_wins'], inplace=True)  
  
    return h2h_win_rates

# Function to estimate win probability based on individual and head-to-head win rates
def estimate_probability(player_1_win_rate, player_2_win_rate, h2h_player_1_win_rate, h2h_player_2_win_rate, performance_weight=0.7, h2h_weight=0.3):  
    if player_1_win_rate + player_2_win_rate == 0:  
        player_1_win_rate = player_2_win_rate = 0.5  
  
    if h2h_player_1_win_rate + h2h_player_2_win_rate == 0:  
        h2h_player_1_win_rate = h2h_player_2_win_rate = 0.5  
  
    weighted_player_1_win_rate = performance_weight * player_1_win_rate + h2h_weight * h2h_player_1_win_rate  
    weighted_player_2_win_rate = performance_weight * player_2_win_rate + h2h_weight * h2h_player_2_win_rate  
    total_weighted_win_rate = weighted_player_1_win_rate + weighted_player_2_win_rate  
  
    player_1_probability = weighted_player_1_win_rate / total_weighted_win_rate  
    player_2_probability = weighted_player_2_win_rate / total_weighted_win_rate  
  
    return player_1_probability, player_2_probability

# Function to update odds based on performance weight and head-to-head weight  
def update_odds(match_id, performance_weight, h2h_weight, data, h2h_win_rates):  
    match = data.loc[match_id]  
    player_1 = match['Player_1']  
    player_2 = match['Player_2']  
  
    player_1_win_rate = player_stats.loc[player_stats['Opponent'] == player_1, 'Win Rate']  
    player_2_win_rate = player_stats.loc[player_stats['Opponent'] == player_2, 'Win Rate']  
  
    player_1_win_rate = player_1_win_rate.iloc[0] if not player_1_win_rate.empty else 0  
    player_2_win_rate = player_2_win_rate.iloc[0] if not player_2_win_rate.empty else 0  
  
    h2h_player_1_win_rate = h2h_win_rates.loc[h2h_win_rates['Opponent'] == player_2, 'Head-to-Head Win Rate']  
    h2h_player_2_win_rate = h2h_win_rates.loc[h2h_win_rates['Opponent'] == player_1, 'Head-to-Head Win Rate']  
  
    h2h_player_1_win_rate = h2h_player_1_win_rate.iloc[0] if not h2h_player_1_win_rate.empty else 0  
    h2h_player_2_win_rate = h2h_player_2_win_rate.iloc[0] if not h2h_player_2_win_rate.empty else 0  
  
    player_1_probability, player_2_probability = estimate_probability(player_1_win_rate, player_2_win_rate, h2h_player_1_win_rate, h2h_player_2_win_rate, performance_weight, h2h_weight)  
    max_odd_1 = data['Odd_1'].max()  
    max_odd_2 = data['Odd_2'].max()  
      
    if player_1_probability == 0:  
        updated_odd_1 = max_odd_1  
    else:  
        updated_odd_1 = 1 / player_1_probability  
      
    if player_2_probability == 0:  
        updated_odd_2 = max_odd_2  
    else:  
        updated_odd_2 = 1 / player_2_probability  
  
    return updated_odd_1, updated_odd_2  
  
# Calculate head-to-head win rates  
h2h_win_rates = calculate_head_to_head_win_rate(atp_tennis)  
print(h2h_win_rates)
  
# Initialize an empty DataFrame for storing the results  
updated_odds_df = pd.DataFrame(columns=['Match ID', 'Original Odd_1', 'Original Odd_2', 'Updated Odd_1', 'Updated Odd_2'])  

# Loop through the affected matches  
for match_id in federer_match_ids:  
    # Update the odds for the match  
    updated_odd_1, updated_odd_2 = update_odds(match_id, 0.7, 0.3, atp_tennis, h2h_win_rates)  
  
    # Append the results to the DataFrame  
    original_odd_1 = atp_tennis.loc[match_id, 'Odd_1']  
    original_odd_2 = atp_tennis.loc[match_id, 'Odd_2']  
    updated_odds_df = pd.concat([updated_odds_df, pd.DataFrame({'Match ID': [match_id], 'Original Odd_1': [original_odd_1], 'Original Odd_2': [original_odd_2], 'Updated Odd_1': [updated_odd_1], 'Updated Odd_2': [updated_odd_2]})], ignore_index=True)  

# Display the resulting DataFrame  
print(updated_odds_df)  

'''

[USER (data scientist)]: That's great. I need to compare the updated odds for the affected matches to the original odds. Please provide a dataframe with Match IDs, Original Odds, Updated Odds, and Comparison Metrics (Difference and Percentage Change).

[YOU (AI assistant)]: To compare the updated odds for the affected matches to the original odds and provide a dataframe with Match IDs, Original Odds, Updated Odds, and Comparison Metrics, you can use the following code :
'''
import pandas as pd
import numpy as np
import pickle

atp_tennis = pd.read_csv('atp_tennis.csv')

# Calculate the difference and percentage change between original and updated odds
updated_odds_df['Difference Odd_1'] = updated_odds_df['Updated Odd_1'] - updated_odds_df['Original Odd_1']
updated_odds_df['Difference Odd_2'] = updated_odds_df['Updated Odd_2'] - updated_odds_df['Original Odd_2']
updated_odds_df['Percentage Change Odd_1'] = (updated_odds_df['Difference Odd_1'] / updated_odds_df['Original Odd_1']) * 100
updated_odds_df['Percentage Change Odd_2'] = (updated_odds_df['Difference Odd_2'] / updated_odds_df['Original Odd_2']) * 100

print(updated_odds_df)

'''

[USER (data scientist)]: Thank you. Now, I need to know the potential impact of the updated odds on the later rounds of the tournament. Please provide a dataframe with Round, Original Odds, Updated Odds, and Potential Impact Metrics (Changes in Match Favorites, Shifts in Tournament Dynamics).

My template of code snippet is:

---BEGIN CODE TEMPLATE---

import pandas as pd
import numpy as np
import pickle

atp_tennis = pd.read_csv('atp_tennis.csv')

# YOUR SOLUTION BEGIN:
<code1>
[COMPLETE YOUR CODE] 
</code1>
# YOUR SOLUTION END

print(impact_df)

# save data
pickle.dump(impact_df,open("./pred_result/impact_df.pkl","wb"))

---END CODE TEMPLATE---

[YOU (AI assistant)]: To analyze the potential impact of the updated odds on the later rounds of the tournament (quarterfinals, semifinals, and finals), and provide a dataframe with Round, Original Odds, Updated Odds, and Potential Impact Metrics, you can use the following code:
'''
import pandas as pd
import numpy as np
import pickle

atp_tennis = pd.read_csv('atp_tennis.csv')

# YOUR SOLUTION BEGIN:
